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Analytical  approximations  to  conditional  distribution  functions 

By  Thomas  J.  DiCiccio,  Michael  A.  Martin 
and  G.  Alastair  Young 

SUMMARY.  Conditional  inference  plays  a  central  role  in  statistics,  but  determination  of 
relevant  conditional  distributions  is  often  difficult.  We  develop  analytical  procedures  that 
are  accurate  and  easy  to  apply  for  approximating  conditional  distribution  functions.  For 
a  continuous  random  vector  X  =  (X1 , . . . ,  Xp),  we  estimate  conditional  tail  probabilities 

pr(Y*  <  a1  \Y2  =  a2 , . . .  ,Yk  =  a*),  k  <  p, 


where  Y*  —  g'(Xl , . . .  Xp)  ( i  =  1, . . . ,  k)  and  g1, . . .  ,gk  are  smooth  functions  of  X.  Previ¬ 
ous  approaches  have  dealt  with  the  cases  where  the  variable  whose  conditional  distribution 
is  sought  is  a  linear  function  of  means,  and  where  there  are  p  —  1  conditioning  variables. 
However,  in  many  practical  circumstances  the  statistic  of  interest  is  a  nonlinear  function  of 
means  and  it  is  advantageous  to  condition  on  a  lower-dimensional  ancillary  statistic.  Our 
procedure  first  involves  approximating  the  marginal  density  function  fYi  ,...,Yk  (y1,  •  •  •  ,yk) 
for  Y1, . . . ,  Yk,  by  an  approach  of  Phillips  (1983)  and  Tierney,  Kass  and  Kadane  (1989). 
An  accurate  approximation  to  the  required  conditional  probability  is  then  obtained  by 
applying  a  marginal  tail  probability  approximation  of  DiCiccio  and  Martin  (1991)  to  the 
conditional  density  of  Y1  given  V2, . . . ,  Yk  which  satisfies 


fY'\Y*,...,Y*(v1\Yi  =  a2,...,Yk  a*  ak )  cx  fY i . y*(y1,aa 


,  a 


Our  method  is  illustrated  in  several  examples,  including  one  which  uses  a  saddlepoint 
approximation  for  the  density  of  X,  and  the  method  is  applied  for  conditional  bootstrap 
inference. 


Some  key  words:  Ancillary  statistic;  Conditional  bootstrap;  Laplace’s  method;  Marginal 
density;  Saddlepoint  approximation;  Tail  probability  approximation. 
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1.  Introduction 

Conditional  distributions  play  a  key  role  in  many  inference  problems,  largely  through 
the  use  of  the  conditionality  principle  and  notions  such  as  ancillarity.  Unfortunately,  it 
is  often  difficult  or  impossible  to  compute  exact  conditional  distributions,  and  standard 
approximation  methods  often  fail  to  work  or  are  difficult  to  adapt  to  the  situation  at  hand. 
For  example,  Edgeworth  expansions  can  yield  negative  probability  estimates  in  the  tails  of 
a  distribution,  and  saddlepoint  methods  based  on  cumulant  generating  functions  are  only 
easily  applied  when  the  variables  of  interest  are  means. 

Skovgaard  (1987)  investigated  the  use  of  saddlepoint  methods  in  the  case  of  a  bi¬ 
variate  mean  to  develop  analytical  approximations  to  the  conditional  distribution  of  one 
mean  given  the  other.  He  extended  his  method  to  the  case  of  p  means,  approximating  the 
conditional  distribution  of  a  linear  function  of  the  means  given  a  (p  —  l)-dimensional  linear 
function  of  them.  Wang  (1991)  extended  Skovgaard’s  results  further  to  include  the  case 
of  approximating  the  conditional  distribution  of  a  mean  given  p  —  1  nonlinear  functions 
of  the  means.  Davison  and  Hinkley  (1988)  applied  Skovgaard’s  approach  in  a  conditional 
bootstrap  context.  They  extended  Skovgaard’s  results  to  include  the  conditional  distribu¬ 
tion  of  certain  functions  of  the  means  given  a  (p  —  l)-dimensional  linear  function  of  them, 
where  the  functions  in  question  lead  to  statistics  which  are  solutions  of  linear  estimating 
equations.  The  techniques  of  Skovgaard  and  Wang  have  several  elements  in  common  that 
may  limit  their  applicability.  First,  because  they  are  based  on  saddlepoint  approximations, 
their  methods  require  knowledge  of  the  cumulant  generating  function  of  the  entire  random 
vector  of  interest.  Second,  their  technique  restricts  the  variable  whose  conditional  distri¬ 
bution  is  sought  to  be  a  linear  function  of  means,  or  at  least  to  be  a  function  of  means 
identified  with  a  linear  estimating  equation.  This  restriction  can  be  quite  severe  in  prac¬ 
tice.  Finally,  and  most  importantly,  the  number  of  conditioning  variables  is  necessarily 
p  —  1.  However,  in  many  cases  of  practical  interest,  an  ancillary  exists  that  is  of  lower 
dimension  than  p  —  1,  and  interest  centers  on  distributions  conditional  on  that  ancillary. 

In  this  paper,  we  develop  an  analytical  approximation  to  conditional  tail  probabilities 
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for  a  smooth  function  of  a  random  vector  X  =  ( X 1 , . . . ,  Xp)  given  k  —  1  other  smooth  func¬ 
tions  of  X ,  where  k  <  p.  The  vector  X  is  not  restricted  to  a  vector  of  means;  we  assume 
only  that  its  joint  density  function  is  of  the  form  cb(x)  exp{^(x)}.  Also,  the  variable  whose 
conditional  distribution  is  sought  may  be  a  smooth,  non-linear  function  of  X ,  giving  our 
method  considerable  generality.  Moreover,  our  method  allows  the  dimension  of  the  condi¬ 
tioning  variable  to  be  less  than  p  —  1,  so  that  a  lower-dimensional  ancillary  statistic  may 
be  conditioned  on  if  it  exists.  Our  technique  produces  accurate  approximate  conditional 
tail  probabilities,  and  is  based  on  applying  DiCiccio  and  Martin’s  (1991)  tail  probability 
approximation  to  a  marginal  density  approximation  given  by  Phillips  (1983)  and  Tierney, 
Kass  and  Kadane  (1989),  by  noting  that  the  required  conditional  density  is  proportional  to 
the  marginal  density  for  fixed  values  of  the  conditioning  variables.  A  secondary,  theoretical 
contribution  of  the  paper  is  to  show  that  the  approaches  of  Phillips  (1983)  and  Tierney, 
Kass  and  Kadane  (1989)  for  developing  marginal  density  approximations  are  equivalent. 

An  important  feature  of  our  approximation  is  that  it  avoids  costly  numerical  integra¬ 
tion.  An  obvious  alternative  approach  to  use  of  our  method  is  numerical  integration  of 
a  renormalized  version  of  the  conditional  density  approximation  that  arises  in  developing 
our  technique.  The  first  obstacle  to  implementing  this  approach  is  that  renormalization 
requires  the  computation  of  a  second  numerical  integral.  However,  both  numerical  inte¬ 
gration  steps  are  practically  infeasible  because  each  density  function  evaluation  requires  a 
potentially  costly  constrained  maximization  step.  In  contrast,  application  of  our  method 
requires  only  four  function  evaluations. 

Section  2  of  the  paper  describes  our  theoretical  results.  In  section  3,  we  discuss 
computational  aspects  of  our  method  and  provide  a  formal  algorithm  for  its  use.  Examples 
of  the  use  of  our  method  are  given  in  Section  4,  including  an  application  to  the  conditional 
bootstrap. 


2.  Conditional  Tail  Probability  Approximation 


Consider  a  continuous  random  vector  X  =  (X1,... , Xp)  having  probability  density 
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function  of  the  form 

fx(x)  =  c6(x)exp{^(x)},  x  =  (x\...,xp), 

and  let  x  =  (x1 , . . . ,  xp)  be  the  point  maximizing  £(x),  and  suppose  that  X  -  x  is  Op(n~ s ) 
as  n  — +  oo,  where  n  is  sample  size.  For  each  fixed  x,  assume  that  £(x)  and  its  partial 
derivatives  are  0(n).  We  are  interested  in  approximating  conditional  tail  probabilities 

pr(K‘  <o‘  |K2  =a\...,Yl‘  =  a*),  *  <  P, 

where  a2, . . . ,  ak  are  fixed  constants  and  Y'  =  g'(Xl , . . . ,  Xp )  (i  =  1, . . . ,  k)  for  functions 
g1 , . . . ,  gk  which  are  assumed  to  have  continuous  gradients  that  do  not  vanish  in  an  n~  *- 
neighbourhood  of  x. 

In  order  to  study  the  conditional  distribution  of  Yl  given  Y2, . . . ,  F*,  we  first  consider 
an  approximation  to  the  marginal  density  of  Y1 , . . .  ,Yk .  Two  approaches  to  estimating 
this  marginal  density  are  given  by  Phillips  (1983)  and  Tierney,  Kass  and  Kadane  (1989). 
Both  approaches  utilize  Laplace’s  method  of  approximating  integrals  to  avoid  the  need 
for  high- dimensional  integration,  and  it  is  shown  here  that  they  yield  the  same  marginal 
density  approximation;  see  the  Appendix.  We  will  use  elements  of  both  approaches  to 
describe  our  method,  so  we  now  briefly  describe  each  approach. 

Phillips  (1983)  assumes  a  1-1  transformation 

Y  =  (Yl , . . . ,  Yp)  =  {g\Xl ,. . .  ,Xp),. . .  ,gp{Xl ,. . .  ,XP)} 

of  X,  where  the  variables  of  interest  are  Yl , . . .  ,Yk,  and  the  functions  gk+l , . . .  ,gp  are 
smooth  and  have  non-zero  gradients  in  an  -neighbourhood  of  x.  Let  J {x(y)}  denote 
the  Jacobian  of  this  transformation.  Then  the  probability  density  function  of  Y  is  of  the 
form 

fv(y)  =  c6(y)exp{%)},  y  =  (y\ . . . ,  yp), 

where  b(y)  =  fe{x(y)}/det[J{x(y)}]  and£(y)  =  £{x(y)}.  The  marginal  density  of  Y1 ,...  ,Yk 
is  then 

/y> . y*(y1>  •  •  •  >y*)  =  c  f  /  b(y)exp{£(y)}  dyk+1  ...dyp 

J —oo  J  —  oo 
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•  •  •  I-oo  ^(y)exp{^(y)>  dyk+1 ... dy p 
JZo  •  •  •  JZo  Ky)  exp{£(y)}  dy1...  dyP 


Let  y  be  the  value  of  y  maximizing  £(y),  and  let  y  =  y{yl,.-.,yk)  be  the  value  of  y 
maximizing  £(y)  subject  to  the  first  k  components  of  y  being  held  fixed  at  the  values 
yV..,y*.  Let  £,(y)  =  di(y)/dy\  lij{y)  =  d2l{y)/dyldyi  ( i,j  =  l,...,p).  Applying 
Laplace’s  method  to  the  numerator  and  denominator  of  (1),  an  approximation  to  the 
marginal  density  of  Y1 , . . . ,  Yk  is 


/y* . Y“(y\- --,yk) 


—k/2  det{Q(y)} 
[det{ft'(y)} 


H(y) 


exp{£(y)  -  £(y)}, 


where  fi(y)  is  the  p  x  p  matrix  whose  (i,j) th  element  is  — £,>(y),  and  fi'(y)  is  the  (p  — 
k)  x  (p  —  k )  submatrix  of  Q(y)  corresponding  to  pairs  (i,j)  where  i,j  =  k  +  1, ...  ,p.  An 
apparent  problem  with  approximation  (2)  is  that  it  requires  specification  of  p  —  k  new 
functions  gk+1, . . .  ,gp  of  X,  the  choice  of  which  may  affect  the  accuracy  of  (2). 

Tierney,  Kass  and  Kadane  (1989)  provide  a  formula  for  the  approximate  marginal 
density  of  Yl,...,Yk  that  does  not  require  specification  of  gk+1, . . . ,  gp.  To  describe 
their  formula,  we  first  need  additional  notation.  Let  x  be  the  value  of  x  maximizing  £(x) 
subject  to  the  constraints  g1(x1, . . . ,  xp)  =  y1,  •  •  •  ,gk(x1, . . .  ,xp)  =  yk,  and  let  H(x )  be 
the  Lagrangian  for  this  constrained  maximization, 


tf(x)=£(x)  +  AQ{y“(x)-y°}, 

where  Aa  =  Aa(yJ, . . . , yk)  (a  =  l,...,Ar)  and  the  usual  summation  convention  applies 
whereby  summation  is  assumed  over  indices  appearing  as  both  a  subscript  and  a  su¬ 
perscript.  Let  £,(x)  =  d£(x)/dx\  £tJ(x)  =  d2£(x)/dx'dx^ ,  Hij(x)  =  d2 H (x) / dx' dx} , 
g?(x)  =  dga(x)/dx\  y£(x)  =  d2ga{x)/dxldxi  ( i,j  =  1  ,...,p;  a  =  1,  — , Ar)  denote  the 
partial  derivatives  of  £,  H  and  ga,  respectively.  Define  the  p x p  matrices  A(x)  =  {-£,J!(x)}, 
the  inverse  of  the  matrix  whose  (i,j)th  element  is  — £tJ(x),  and  A(x)  =  {— H'i(x)},  the 
inverse  of  the  matrix  whose  (i,  j)th  element  is  -H,;(x)  (i,  j  =  1, . . .  ,p),  and  the  k  x  k  ma¬ 
trix  0(x)  whose  (a,/?)th  element  is  -H'’(x)g?(x)g?(x)  (a,/?  =  l,...,fc).  Then,  Tierney, 
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Kass  and  Kadane’s  approximation  to  the  marginal  density  of  Y1 , . . . ,  Y k  is 

. ^  [det{A^)}de'{U)}] 1 W) eXP(£(i)  -  £<i)}'  <3) 

Proposition  1.  Approximations  (2)  and  (3)  to  the  marginal  density  of  Yl,...,Yk  are 
equivalent. 

Proposition  1  is  proved  in  the  Appendix.  In  particular,  it  is  shown  there  that 


det{f)(y)}  ' 

!1  = 

det{A(x)} 

2  b(x ) 

det{fl'(y)} 

Kv) 

det{A(i)}det{0(x)} 

Wr 

m  -  m  =  *(£)  -  (5) 

Our  rationale  in  what  follows  is  that 

/y1|V2,...,yt(y1  |  Y2  =  a2,. . .  ,Yk  =  ak)  oc  fY i yk(y1,a2,. . .  ,ak), 

so  that  we  may  write 

/v-iv . y.(»‘  |^  =  «I.-.l'*-o*)«»V)exp{r(v1)}.  (6) 

for  suitably  defined  functions  £*  and  b*.  We  apply  the  DiCiccio  and  Martin  (1991)  tail 
probability  formula  to  obtain  approximations  to  conditional  tail  probabilities  pr(K1  < 
a1  |  Y2  =  a2, . . . ,  Yk  =  ak).  Fix  the  values  of  y2,...,yfc  in  the  preceding  discussion  at 
their  conditioned  values  a2, . . . ,  ak,  respectively.  Then  y  =  y(y* ,  a2, . . . ,  ak)  is  a  function 
of  y1  alone,  and  is  the  value  of  y  maximizing  £(y)  subject  to  the  first  k  components  of  y 
being  fixed  at  the  values  y1 ,  a2, . . . ,  ak,  respectively.  Analogously,  x  =  x(yx , a2, . . . , a*)  is 
a  function  of  y1,  and  is  the  value  of  x  maximizing  £(x)  subject  to  the  constraints  g1(x)  = 
y1,g2(x )  =  a2, . .  .,gk(x)  =  ak.  Then  6*(y1)  in  (2)  is  given  by 

/  J  i  \  ilf-/  1  1  k\~\ 


lx  =  (  det{ft(y)}  yw.g2.---.g*)} 

Vdet[^'{y(y1»a25---,ak)}]/  b(y) 

_  (  dettAfxCy^a2,...,^))]  \  2  6{x(y] ,  a2, . . . ,  ak)} 

\det{A(f  )}det[@{x(y1,a2, . . . ,  ak)}]J  b{ 


(7) 
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and  t'iy1)  is  given  by 


(’(«' )  =  Hi Kv'  y . <■*)}-<(■»  =  HHy' ,  a2, . . . ,  a1)}  -  <?(x); 


see  (4)  and  (5),  respectively.  DiCiccio  and  Martin’s  (1991)  tail  probability  approximation 
for  densities  of  the  form  (6)  is 


pr(F*  <  a 1 


a2,...,Yk  =  a*)*$(r)  + 


£*(1V)  ^(y1)/’ 


(8) 


where  y1  maximizes  ^‘(y1),  r  =  sgn^1  —  y1)^^*^1)  —  £*(a1)}]2,  ^’^(y1)  =  d£*(y1)/dy1 , 
f  (2)(yJ)  =  d? £* (y1 ) / d(y*  )2  denote  the  first  two  derivatives  of  £*(yJ),  and  $  and  (f>  denote 
standard  normal  distribution  and  density  functions,  respectively.  A  simpler  approximation 
to  the  required  conditional  probability  is  to  just  use  the  leading  term  of  (8);  that  is, 


pr(F*  <  a1  |  F2  =  a2,. . .  ,Yk  =  ak )  «  $(r). 


(9) 


This  alternative  approximation  is  much  easier  to  compute  than  the  full  approximation 
(8),  but  it  is  also  significantly  less  accurate  in  our  experience.  Typically,  the  error  in 
approximation  (8)  is  of  order  0(n-3/2),  while  the  error  in  approximation  (9)  is  of  order 
O(n-i). 

We  now  outline  the  expression  of  the  various  components  of  tail  probability  approx¬ 
imation  (8)  in  terms  of  the  original  functions  b,b,£,  and  l.  To  this  end,  note  that  y1 
maximizes  f(y)  subject  to  y2, . . . ,  yk  being  fixed  at  their  conditioned  values  a2, . . . ,  ak ,  and 
let  x  be  the  value  of  x  maximizing  £(x)  subject  to  g2(x)  =  a2, . . .  ,y*(x)  =  a*.  Then 
y1  =  gl(x )  and  x  =  x{yl,a 2, . . .  ,ak).  Hence, 

r  =  sgnfa1  -  y1(z)}(2[^(i)  -  £{x{a\. . .  ,a*)}]) * . 

Next,  observe  from  (7)  that 

*V)  =  (det[A{x(al , . . . ,  a*)}]det{Q(x)}  \  »  bjxja1 . a*)} 

b*(yl)  \det{^(®)}det[0{z(a1,...  ,ak)}]J  b(x) 


(10) 
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which  is  readily  computed  using  values  of  £t]  and  the  Lagrange  multipliers  Aa(a1, . . . ,ak) 
(a  =  1, . . . ,  k)  obtained  in  finding  x(ak,. . . ,  ak). 

Now,  to  compute  £*^(yk ),  it  is  convenient  to  work  with  the  definition  of  ^‘(y1 ) 
involving  £.  Then, 


f(1V)  =  ^r[<W,o2, •••,»*))] 


. «•))] 


where  yj(yx , . . . ,  yk)  =  dy' /dyl  and  the  index  i  runs  from  1  top.  However,  yf  (yx,a2, . . .  ,afc) 
equals  1  for  a  =  1  and  zero  for  a  =  2,  Moreover,  £,»  {y(yJ ,  a2, . . . ,  a*)}  =  0  for 

i'  =  k  +  1 , . . . ,  p  since  y(yx ,  a2 , . . . ,  ak )  maximizes  £  subject  to  the  first  k  components  of  y 
being  held  fixed  at  the  values  y1,  a2, . . . ,  ak,  respectively.  Hence, 


Now,  the  Lagrangian  for  maximizing  £(y)  subject  to  the  first  k  components  of  y  being  held 
fixed  is  H(y)  =  £(y)  +  AQy“,  where  the  index  a  rims  from  1  to  k.  Note  that  the  Lagrange 
multipliers  A  a(y1,a2, ... ,  ak )  (a  =  1, . . . ,  k)  are  the  same  as  those  for  maximizing  £(x)  sub¬ 
ject  to  y1(x)  =  y1,g2(x)  =  a2,. . .  ,gk(x)  =  ak.  A  Lagrange  multiplier  argument  involving 
H(y)  yields  A  a(y1  ,a2,. . .  ,ak)  =  —  £a{y{yl  ,a2 , . . .  ,a*)}  (a  =  In  particular,  we 

have 

C('V)  =  -W,aV  (11) 

The  second  derivative  )  is  harder  to  compute,  and  there  does  not  seem  to  be 

a  closed  form  expression  for  it  in  general.  However,  it  is  readily  approximated  numerically 
using  the  formula 


r(2)(uM  «  ^(V1  +6, a2,..., a*)}  —  2l{x(y1,a2,. . . ,a*)}  -f  l{x(y*  -  6,  a2, . . . ,  a*)} 

vy  i  ~  £2  * 


(12) 
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for  a  small  value  of  6.  It  is  convenient  in  this  instance  to  work  with  the  definition  of  £*{yl ) 
involving  £(x).  Further  details  concerning  computation  of  (12)  are  given  in  Section  3.  Tail 
probability  approximation  (8)  may  then  be  computed. 

An  important  special  case  of  our  approximation  (8)  occurs  when  k  =  p;  that  is, 
when  the  number  of  conditioning  variables  is  p  —  1.  This  is  the  only  case  considered  by 
Skovgaard  (1987)  and  Wang  (1991).  In  that  case,  the  marginalization  step  to  approximate 
the  marginal  density  of  Yl , . . . ,  Yk  is  not  required  and  the  function  £  and  its  derivatives  are 

easily  specified.  Therefore  the  conditional  density  /yi|y2 . yj^y1  |  Y2  =  a2 , . . .  ,YP  —  ap) 

is  proportional  to  b(yl,a2,. . .  ,ap)  exp{£(yl,  a2, . . . , ap)}.  Consequently,  approximation  (8) 
assumes  the  particularly  simple  form 


p riY1  <  a1  | Y2  =  a  , . . . ,  Yp  —  ap) 


$(r)  +  4>{r)  -  + 
r 


{-hiiy1^2, 

..,ap)}i  6(a\... 

,ap) 

,ap)  b(yl,a2,. 

where  r  =  sgn(cd  -y1)[2{£(y1  ,a2 , . . .  ,ap)-£(a1,.. .  ,ap)}]i  and  y1  maximizes  £(y)  subject 
to  y2, . . . ,  yp  being  held  fixed  at  their  conditioned  values,  a2, . . . ,  ap,  respectively. 


3.  Computational  Details 

In  this  section,  we  outline  some  computational  details  for  computing  (8).  Our  aim  is  to 
provide  a  formal  algorithm  for  the  method.  Assume  that  the  conditioning  values  a2, . . . ,  ak 
axe  specified,  and  we  wish  to  approximate  pr(F*  <  a1  |  Y2  =  a2, . . . ,  Yk  =  ak)  for  various 
values  of  a1 . 

Step  Is  Find  x,  the  value  of  x  which  maximizes  £(x)  subject  to  the  constraints 
y1  (x)  =  a1 , . . . ,  gk(x)  =  ak.  Typically,  this  step  involves  the  solution  of  a  system  of 
(p+k)  non-linear  equations  in  as  many  unknowns  to  obtain  x  and  the  Lagrange  multipliers 
Ai(a!, ...  ,ak  ),...,  Afc(a1, ...  ,a*)  needed  for  later  steps.  Of  course,  the  system  of  equations 
referred  to  here  is  linear  in  the  A^’s,  and  so  a  little  algebra  usually  results  in  the  reduction 
of  the  problem  to  that  of  solving  a  system  of  p  nonlinear  equations  in  p  unknowns. 

Step  2:  Find  x,  the  value  of  x  maximizing  £(x)  subject  to  g2{x)  =  a2, . . .  ,gk(x)  =  ak . 
This  step  typically  requires  solution  of  (p+  k—  1)  non-linear  equations  in  as  many  unknowns. 
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Step  3:  Form  r  =  sgn{a1  —  51(x)}[2{^(x)  —  £(i)}]i ,  and  hence  approximation  (9)  to 
the  required  conditional  distribution  function. 

Step  4:  Calculation  of  the  factor  b*(al)/b*(yl)  follows  readily  from  (10).  This  step 
requires  calculation  of  the  derivatives  fij(x)  and  gfj(x)  ( a  =  1,...,&).  These  derivatives 
are  readily  calculated  either  analytically  or  numerically. 

Step  5:  The  first  derivative  ^‘^(a1)  is  given  by  —  A] (a1,...  ,ak)  from  Step  X. 

Step  6:  The  second  derivative  f*^2\ y *)  is  calculated  numerically  using  (12).  In 
order  to  calculate  (12),  the  quantities  i(y:  +  <5,  a2, . . . ,  ak)  and  x(yl  —  8,  a2 , . . .  ,ak)  are 
needed.  In  the  former  case,  x(yl  +  8,  a2 , . . . ,  ak)  is  that  value  of  x  maximizing  f( x )  subject 
to  the  constraints  g1(x)  =  y1  +  8,g2(x)  =  a2 , . . . ,  gk(x)  =  a*,  and  so  may  be  found  by 
solving  the  system  of  equations  from  Step  1  except  with  a1  replaced  by  y1  +  8.  Similarly, 
x{yl  —  <5,  a2, . . . ,  ak)  may  be  found  by  solving  the  same  system  of  equations  with  a1  replaced 
by  y1  —  8.  Approximation  (8)  may  then  be  computed  using  (10),  (11)  and  (12). 

In  implementation  of  the  algorithm  described  above  we  have  used  the  packages  Min- 
pack  and  NAG  to  solve  the  required  systems  of  equations. 

4.  Examples 

4.1.  Conditional  inference  for  a  normal  mean  when  the  coefficient  of  variation  is  known. 

Let  Wi, . . .  ,Wn  be  a  sample  from  a  normal  distribution  N(n,c2  /z2),  /i  >  0,  with 
known  coefficient  of  variation  c  >  0.  The  mean  y.  is  the  parameter  of  interest.  For 
simplicity,  let  c  =  1.  Let  T  =  W  and  U  =  —  W)2 .  The  statistic  Z  —  g2(T,U)  = 

nkT/(U  +  nT2)?  is  ancillary;  see  Hinkley  (1977)  and  Lehmann  (1991).  Lehmann  (1991, 
p.549)  gives  the  conditional  density  of  V  =  y!(T,  U)  =  (U  -f  nT2)j  given  Z  =  z: 

fv\z(v  I  Z  =  z)  =  ky~nvn~l  exp{  — |(/x_1i>  -  n$z)2}.  (14) 

In  this  case,  the  number  of  conditioning  variables,  k,  is  equal  to  p  —  1,  so  the  simpler 
approximation  (13)  may  be  used  to  approximate  conditioned  tail  probabilities  pr(F  <  v  \ 
Z  =  z).  Of  course,  in  this  instance,  tail  probabilities  can  be  computed  easily  through 
numerical  integration  of  the  exact  density  (14),  so  this  example  serves  merely  to  illustrate 
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the  accuracy  of  our  method.  Suppose  the  true  value  of  p  is  1.  Then,  the  joint  density 
of  T  and  U  is  proportional  to  u("-3)/2  exp{  —  ^n(t  —  l)2  —  |u}.  We  choose  b(t,u )  =  1 
and  £{t,u)  =  —  ^n(<  —  l)2  —  +  ^(n  —  3)logu  here,  rather  than  the  more  obvious  choice 

b(t,u)  =  u(n-3)/2,  £(i,u)  =  —\n(t  —  l)2  —  ^u,  because  in  the  latter  case  d£(t,u)/du 
never  vanishes,  meaning  that  (£,  u)  falls  on  the  boundary  of  possible  (t,u)  values.  Now, 
b(v,  z)  =  2 n"2t)2  and  £(v,  z)  =  zvnS  —  |n  —  ^v2  +  (n  —  3)  log  w  +  ^(n  —  3)log(l  —  z2).  The 
constrained  maximizing  point  v  —  v(z)  of  £(v,z)  subject  to  z  being  fixed  at  its  conditioned 
value  may  be  computed  algebraically  here,  so  approximation  (13)  becomes 


pr(F  <  v  |  Z  =  z) 


.  .fl  (v2  +  n  —  3)2v3  1 


^  tr  (zvn  2  —  v2  +  n  —  3)u3  J 

where  r  =  sgn(u  —  v)\2{£(v,z)  —  ^(i>,r)}]2  and  v  =  [1  +  {1  +  4z-2n-1(n  —  3)}  *]. 

Table  1  reports  approximations  pr(K  <  v  |  Z  =  z)  when  n  —  10  and  z  =  0.75  for 
various  values  of  v.  In  this  instance,  our  approximation  is  very  close  to  exact  values  based 
on  numerical  integration  of  (14).  The  simple  approximation  (9)  tends  to  perform  poorly. 

Wang  (1991)  considers  tail  probabilities  for  V2  =  n~x  Yh  W2  given  Z  =  z.  His  method, 
which  is  based  on  a  saddlepoint  approximation  to  the  joint  distribution  of  several  means, 
is  not  valid  in  our  case  as  it  can  only  be  used  to  find  the  conditional  distribution  of  a  linear 
function  of  means,  here  the  mean  of  W2  itself,  given  (p—  1)  smooth  functions  of  the  means. 

4.2.  Saddlepoint  approximations  and  an  application  to  the  conditional  bootstrap. 

Skovgaard  (1987)  and  Wang  (1991)  consider  the  special  case  where  X  =  (X1 , . . . ,  Xp ) 
is  a  vector  of  means  and  the  density  fx(x)  1S  approximated  by  a  saddlepoint  approxi¬ 
mation.  In  that  setting,  consider  n  observations  of  a  p-dimensional  random  vector  W  = 
( W\ , . . . ,  Wp).  Denote  the  cumulant  generating  function  of  W  by  K(T\ , . . . ,  Tp).  Then  the 
usual  saddlepoint  approximation  to  the  joint  density  of  X  =  (Wj, . . . ,  Wp)  is  proportional 


■  ,xp)  oc  |A(x1,...,xp)|  2  exp[n{  A”(Ti , . . . ,  Tp)  -  ^T,x'}], 

i=i 

whe.  he  saddlepoint  (T\ , . . . ,  Tp)  satisfies 

KTi  {T\ , . .  • ,  Tp)  =  x’  (i  =  l,...,p) 
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Kt(  =  dK(Ti,. . .  ,Tp)/dTt,  and  A  =  {KTiTj(fi, . . .  ,TP)}  is  the  k  x  k  matrix  of  second- 
order  partial  derivatives  Kt.t,  (7i  ,  . . . ,  Tp)  =  d2K(T\ , . . . ,  Tp)/ dTtdT3  (i,j  =  1, . . .  ,p)  eval¬ 
uated  at  T\ , . . . ,  Tp.  General  reviews  of  saddlepoint  methods  are  given  by  Barndorff-Nielsen 
and  Cox  (1979)  and  Reid  (1988). 

Approximation  (12)  can  be  used  to  approximate  conditional  tail  probabilities  pr(K!  < 
a1  |  Y2  =  a2, . . . ,  Yk  =  ak)  where  Y1  =  gl{X), ...  ,Yk  =  gk(X)  are  smooth  functions  of 
the  means.  It  is  convenient  in  this  instance  to  take  6(xx, . . .  ,xp)  =  |A(xx, . . .  ,xp)|-i  and 
t(x1,...,xp)  =  n{K(Ti,. . . ,  Tv)  -  T,x1}.  Wang’s  (1991)  method  is  only  valid  when 

the  function  g1  is  linear.  Tierney,  Kass  and  Kadane  (1989)  give  an  approximate  marginal 
density  formula  for  V1, . . . ,  Yk,  from  which  application  of  (12)  is  straightforward. 

We  are  particularly  interested  in  applying  (12)  to  estimate  tail  probabilities  for  the 
conditional  bootstrap.  Monte  Carlo  simulation  to  estimate  conditional  bootstrap  distribu¬ 
tion  functions  is  extremely  tedious,  requiring  careful  stratification  of  bootstrap  resamples 
according  as  to  whether  bootstrap  versions  of  the  conditioned  variables  approximately  sat¬ 
isfy  the  original  conditioning  or  not;  see  Hinkley  and  Schechtman  (1987)  and  Davison  and 
Hinkley  (1988).  In  particular,  a  difficulty  arises  in  deciding  how  close  resamples  need  to 
come  to  the  original  conditioning  criteria  to  be  retained  in  the  simulation.  However,  recent 
methods  based  on  saddlepoint  approximations  have  been  proposed  to  approximate  boot¬ 
strap  distribution  functions  which  avoid  the  need  for  resampling;  see  Davison  and  Hinkley 
(1988),  Daniels  and  Young  (1991),  and  DiCiccio,  Martin  and  Young  (1990).  Analyti¬ 
cal  approximations  such  as  these  are  particularly  important  in  the  conditional  bootstrap 
context  because  they  avoid  the  stratification  problem  discussed  above.  Davison  and  Hink¬ 
ley  (1988)  consider  conditional  bootstrap  inference  for  the  ratio  0  =  E(V)/ E(U),  where 
( Ui,Vi )  ( i  =  l,...,n)  are  pairs  with  common  distribution  function  F.  They  suggest  a 
suitable  model  for  studying  the  conditional  distribution  of  T  =  V /U  given  U\,...,Un  is 
Vi  =  0Ui  +  ufej,  where  are  independent  errors  with  zero  mean  and  variance  a2 .  For 
simplicity,  let  a  =  1.  Then  Var(V  |  ui,...,u„)  =  cr2c  where  c  =  (^ti?)/(!Cu*)2' 
aim  is  to  approximate  the  conditional  bootstrap  distribution  of  T*  =  V  jU  given  the 
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bootstrap  ancillary  A *  =  C7  * 2 )  /  ( XZ  U*)2,  where  (U*,V*)  (i  =  is  a  resample 

drawn  at  random  with  replacement  from  (Ui,Vi)  (i  =  1 ,n).  Davison  and  Hinkley  find 
approximations  to 

pr(F  /U  <  t  |  A\  =  ai,A^  =  a2)  =  pr(V*  -  tU*  <  0  |  A\  =  ai,A%  =  a2), 

where  Ax  =  n~1  ^  U*  and  A2  =  n-1  ^  U*2,  by  applying  Skovgaard’s  (1987)  method. 
They  condition  on  both  Ax  and  A2  reasoning  that,  since  for  their  data  A*  is  highly  cor¬ 
related  with  Ax  and  A2,  “redundancy  of  a  conditioning  variable  is  harmless”.  Note  that 
Skovgaard’s  method  requires  two  conditioning  variables  in  this  case,  because  here  p  =  3 
since  (X1,  X2,  X3)  =  (U  ,V  ,U2  ).  However,  approximation  (12)  allows  us  to  approxi¬ 
mate  conditional  tail  probabilities  p v(V* /TJ*  <  t  |  A*  =  a)  directly. 

Table  2  reports  conditional  tail  probability  approximations  for  V  /U  for  the  data  set 
of  size  25  reported  by  Davison  and  Hinkley  (1988,  Table  3).  We  have  repeated  Davison  and 
Hinkley’s  experiment  with  two  conditioning  variables  A*  and  A*2  using  approximation  (13), 
and  we  have  also  obtained  approximations  to  pr(F* /TJ*  <  t  |  A*  =  a)  using  (12)  which 
could  not  be  obtained  using  their  method.  In  both  cases,  the  joint  cumulant  generating 
function  of  t/,*2)  (i  =  1, . . .  ,n)  given  (£/<,  VJ)  (»  =  1, . . . ,  n)  is 

K(Ti , r2,  T3)  =  log | n-1  jr  exp(T1  U,  +  T2Vt  +  T3U?) 

^  '=i 

In  the  latter  case,  we  consider  the  conditional  distribution  of  Y1  =  g1(X1  ,X2,X3)  = 
X2 /X1  given  Y2  =  g2(X1,X2,X3)  =  n~1X3/(X1)2.  In  each  case,  the  derivatives  of  K 
and  g'  are  easily  calculated  algebraically  or  numerically  and  the  constrained  maximization 
steps  were  carried  out  using  Minpack’s  hybrd  subroutine.  In  order  to  obtain  the  “exact” 
probabilities  for  the  first  case,  resamples  from  the  simulation  experiments  were  stratified 
by  requiring  each  bootstrap  ancillary  to  be  no  more  than  one  quarter  of  its  standard 
deviation  from  its  observed  data  value.  This  requirement  meant  that  only  about  10%  of 
the  bootstrap  resamples  drawn  were  actually  retained  in  estimating  the  exact  probability. 
In  the  second  case,  resamples  for  which  the  bootstrap  ancillary  was  no  more  than  one  tenth 
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of  its  standard  deviation  from  its  observed  data  value  were  retained,  resulting  in  a  10% 
retention  rate  again.  The  results  reported  in  Table  2  are  very  encouraging.  In  particular,  in 
the  latter  case  when  there  is  only  one  conditioning  variable,  approximation  (12)  performs 
very  well,  especially  in  the  upper  tail  of  the  distribution.  The  simpler  approximation 
(9)  fails  badly  in  this  case,  particularly  in  the  center  of  the  distribution.  In  the  case  of 
two  conditioning  variables,  approximation  (9)  and  approximation  (13)  both  perform  well, 
especially  in  the  lower  tail. 

Appendix 

Proof  of  Proposition  1 

First  note  that  since  i(y)  =  £{x(y)},  x(y)  =  x(y )  and  x  =  x(y),  we  have 
exp{^(x)  -  i{x)}  =  exp[f{x(y)}  -  f{x(y)}]  =  exp{£(y)  -  f(y)}. 

Next,  observe  that 

b(y)  _  6{x(y)}det[J{x(y)}]  _  det{  J(y)} 
b(y)  ~  6{x(y)}det[J{x(y)}]  6(x)  det{  J(y)} 

Consequently,  to  show  that  (2)  and  (3)  coincide,  it  remains  to  show 

det{A(x)}  _  det{fl(y)}det{J(y)}2 
det{A(x)}det{0(x)}  det{Q'(y)}det{  J(y)}2 

Note  that  since  £(x)  =  £{y(x)},  it  follows  that 

£,(x)  =  ?k{y(x)}g,k(x)  (i-i,...  ,p ), 


where  the  index  k  runs  from  1  to  p,  and 

tij(x)  =  hi{y(x)}9i(x)glj(x )  +  4{y(x)}y*  (x)  (i,j  =  1, . . .  ,p),  (A. 2) 

where  the  indices  k ,  /  run  from  1  to  p.  Now,  since  f,-(x)  =  0  and  f*(y)  =  0  (*,  k  =  1, . . .  ,p), 
we  have 

-tij(x)  =  -hi(.y)9i(x)glj(x), 

and  hence,  inverting  and  taking  determinants  on  both  sides, 


(A.  3) 


det{A(x)}  =  (det{ft(y)}det{J(y)}2]  *. 
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Recall  that  y  maximizes  £(y)  subject  to  the  first  k  components  of  y  being  fixed.  Hence 
li>(y)  =  0,  for  i'  =  k  +  1, . . .  ,p.  Consequently,  from  (A. 2)  it  follows  that 

=  f-u(j)g-(i)g‘(i)  +  L(y)g°(i) 

=  &(«)».■  (*)»>(*)  -  A.»5(i), 

where  the  index  a  runs  from  1  to  k,  and  Xa  ( a  =  1  are  the  Lagrange  multipliers 

for  the  constrained  maximization  step  to  obtain  y.  Note  as  well  that  AQ  (a  =  1, . . . ,  k)  are 
also  the  Lagrange  multipliers  for  finding  x.  Thus, 

hi(y)gi(x)glj(x )  =  £t](x)  +  A  Qg°j(x)  =  H^x). 

Matrix  inversion  on  both  sides  yields 

P\v)  =  (*),  (A.4) 

and  hence 

[det{fl(y)}]-1  =  det{A(i)}det{J(y)}2.  (A.5) 

Equation  (A.4)  implies  that  0(r)  is  the  k  x  k  submatrix  in  the  upper  left  comer  of  the 
inverse  of  fi(y).  Therefore,  by  the  formula  for  the  determinant  of  a  partitioned  inverse 
(Draper  and  Smith,  1981,  p.127), 

det{ft-1(y)}  =  det{0(x)}det[{n'(y)}-1].  (A.6) 

Hence,  from  (A. 5)  and  (A.6)  it  follows  that 

det{A(x)}  _  1 

det{0(x)}  det{fi'(y)}det{  J(y)}2 

Combining  (A. 3)  and  (A.6),  (A.l)  obtains,  and  the  proof  is  complete. 
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Table  1 

Conditional  tail  probabilities  pr(V  <  v  j  Z  =  0.75)  for  Example  4.1,  n  =  10. 


v 


Exact 


Approximation  (9) 


Approximation  (13) 


1.5 
2.0 

2.5 
3.0 
‘.5 
4.0 

4.5 
5.0 

5.5 
6.0 

6.5 
7.0 

7.5 
8.0 


0.0000 

0.0005 

0.0057 

0.0331 

0.1190 

0.2920 

0.5264 

0.7471 

0.8949 

0.9664 

0.9918 

0.9985 

0.9998 

1.0000 


0.0003 

0.0036 

0.0234 

0.0910 

0.2394 

0.4595 

0.6878 

0.8575 

0.9494 

0.9862 

0.9971 

0.9995 

0.9999 

1.0000 


0.0000 

0.0004 

0.0048 

0.0300 

0.1125 

0.2828 

0.5174 

0.7409 

0.8918 

0.9653 

0.9915 

0.9984 

0.9998 

1.0000 
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Table  2 

Approximations  to  conditional  probabilities  pr(' V* fU*  <  t  \  A*  =  147.9,^2  =  43120) 
and  pr (V* /U*  <  t  \  A*  =  0.07885)  for  n  =  25  pairs  of  Example  4.2.  The  conditioned 
values  chosen  are  the  values  of  Ai,  A  2  and  A  from  the  data. 


t 

Two  conditioning  variables 

Exact*  (9)  (13) 

One  conditioning  variable 
Exact*  (9)  (12) 

7.8 

0.0001 

0.0001 

0.0001 

0.0003 

0.0008 

0.0004 

7.9 

0.0002 

0.0002 

0.0002 

0.0006 

0.0016 

0.0009 

8.0 

0.0008 

0.0007 

0.0007 

0.0013 

0.0033 

0.0020 

8.1 

0.0020 

0.0020 

0.0020 

0.0029 

0.0066 

0.0042 

8.2 

0.0050 

0.0051 

0.0051 

0.0061 

0.0126 

0.0083 

8.3 

0.0115 

0.0118 

0.0117 

0.0125 

0.0231 

0.0157 

8.4 

0.0247 

0.0252 

0.0249 

0.0243 

0.0405 

0.0289 

8.5 

0.0488 

0.0497 

0.0489 

0.0450 

0.0684 

0.0511 

8.6 

0.0888 

0.0905 

0.0892 

0.0794 

0.1108 

0.0868 

8.7 

0.1497 

0.1528 

0.1506 

0.1328 

0.1715 

0.1411 

8.8 

0.2352 

0.2394 

0.2363 

0.2107 

0.2534 

0.2187 

8.9 

0.3432 

0.3492 

0.3452 

0.3155 

0.3565 

0.3219 

9.0 

0.4664 

0.4752 

0.4709 

0.4433 

0.4763 

0.4456 

9.1 

0.5948 

0.6059 

0.6018 

0.5835 

0.6031 

0.5863 

9.2 

0.7151 

0.7276 

0.7243 

0.7185 

0.7241 

0.7188 

9.3 

0.8157 

0.8287 

0.8266 

0.8297 

0.8264 

0.8298 

9.4 

0.8920 

0.9032 

0.9021 

0.9089 

0.9023 

0.9092 

9.5 

0.9429 

0.9514 

0.9511 

0.9573 

0.9513 

0.9576 

9.6 

0.9727 

0.9786 

0.9786 

0.9828 

0.9786 

0.9827 

9.7 

0.9883 

0.9918 

0.9918 

0.9936 

0.9918 

0.9938 

9.8 

0.9955 

0.9973 

0.9973 

0.9980 

0.9972 

0.9981 

9.9 

0.9984 

0.9992 

0.9992 

0.9995 

0.9992 

0.9995 

10.0 

0.9995 

0.9998 

0.9998 

0.9999 

0.9998 

0.9999 

10.1 

0.9999 

1.0000 

1.0000 

1.0000 

0.9999 

1.0000 

10.2 

1.0000 

1.0000 

t  “Exact”  probabilities  based  on  500,000  retained  bootstrap  resamples. 
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SUMMARY.  Conditional  inference  plays  a  central  role  in  statistics,  but  determination  of 
relevant  conditional  distributions  is  often  difficult.  We  develop  analytical  procedures  that 
are  accurate  and  easy  to  apply  for  approximating  conditional  distribution  functions.  For 
a  continuous  random  vector  A*  =  (A1, . . . ,  Xp),  we  estimate  conditional  tail  probabilities 

pr (K1  <  a1  \Y2  =  a2, . . . ,  Yk  =  ak),  k  <  p. 

where  Y*  =  g'( X1 , . . .  Ap)  ( i  =  1 _ _  k)  and  g*, ....  gk  are  smooth  functions  of  X.  Previ¬ 

ous  approaches  have  dealt  with  the  cases  where  the  variable  whose  conditional  distribution 
is  sought  is  a  linear  function  of  means,  and  where  there  are  p  —  1  conditioning  variables. 
However,  in  many  practical  circumstances  the  statistic  of  interest  is  a  nonlinear  function  of 
means  and  it  is  advantageous  to  condition  on  a  lower-dimensional  ancillary  statistic.  Our 

procedure  first  involves  approximating  the  marginal  density  function  fy\ . y*  (y1 , . . . ,  yk ) 

for  F’1 . 1'*,  by  an  approach  of  Phillips  (1983)  and  Tierney,  Kass  and  Kadane  (1989). 

An  accurate  approximation  to  the  required  conditional  probability  is  then  obtained  by 
applying  a  marginal  tail  probability  approximation  of  DiCiccio  and  Martin  (1991)  to  the 
conditional  density  of  Yl  given  i'2, . . . ,  Yk  which  satisfies 

/yMY* . y*(y]  \  Y2  =  a2, . . . ,  Yk  —  ak)  oc  fY > . v* (y1 » q2 - 'a*)- 

Our  method  is  illustrated  in  several  examples,  including  one  which  uses  a  saddlepoint 
approximation  for  the  density  of  X ,  and  the  method  is  applied  for  conditional  bootstrap 
inference. 
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